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ABSTRACT 

This paper describes the process of rapid iterative prototyping used by a research team developing a training 
video game for the Sirius program funded by the Intelligence Advanced Research Projects Activity (IARPA). 
Described are three stages ofdevelopment, including a paper prototype, and builds for alpha and beta testing. 
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discussed along with implications of the rapid iterative prototyping approach. 
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INTRODUCTION 

While under constant pressure for quick and 
accurate judgments, intelligence analysts must 
gather information from a variety of sources, 
and rapidly process it incrementally as it is 
received. In his book The Psychology of Intel¬ 
ligence Analysis, Heuer (2006) refers to this 
process as “a recipe for inaccurate perception” 
(p. 27). As part of their work, intelligence 
analysts must not only evaluate the credibility 
of information they receive, they must also 
attempt to synthesize large quantities of data 
from a variety of sources, including intelligence 
collection assets from their own organizations, 
and from other agencies. 

In their “Sirius” program, the Intelligence 
Advanced Research Projects Activity (I ARPA) 
posed a challenge for researchers to create 
a training video game capable of prompting 
players to recognize cognitive biases within 
their decision making, so as to mitigate their 
occurrence during critical stages of intelligence 
analysis (IARPA, 2011). IARPA set forth a 
number of requirements for game development, 
including mandating which cognitive biases 
should be examined, while leaving research 
teams open to determining the form and content 
of their games, as well as the key theoretical 
mechanisms underpinning their design and 
function. Our team’s response was to develop 
a game called MACBETH (Mitigating Analyst 
Cognitive Bias by Eliminating Task Heuristics) 
in which players are challenged to gather and 
assess intelligence data to stop an imminent 
terrorist attack within a fictional environment. 

This paper provides a design narrative 
(Hoadley, 2002) of a rapid prototyping, user- 
centered approach to developing MACBETH. 
It builds on (1) design narratives of rapid proto¬ 
typing approaches to game design for learning 
(c.f., Aldrich, 2003; Jenkins, Squire, & Tan, 
2004; Squire, 2008, 2010, 2011), (2) models 
of rapid prototyping within instructional design 
(Desrosier, 2011; Jones & Richey, 2000; Tripp 
& Bichelmeyer, 1990), and (3) modern versions 
of entertainment game design (Lebrande, 2010) 
to articulate an integrated approach to designing 


games for learning. This approach addresses the 
requirement that training games (1) must have 
mechanics appropriate to the target domain (2) 
suffice the requirements of multiple stakehold¬ 
ers, and (3) be backed by evidence that games 
are achieving their intended impact without 
causing unforeseen negative consequences. 
Before proceeding with these development is¬ 
sues, however, we firstprovide abriefoverview 
of cognitive bias. 

THEORETICAL APPROACH 
TO COGNITIVELY BIASED 
INFORMATION PROCESSING 

A primary causal mechanism cited for biased 
information processing and poor credibility as¬ 
sessment is the reliance on heuristic social infor¬ 
mation processing—a nonanaly tic orientation in 
which only a minimal set of informational cues 
are considered as long as processing accuracy 
is deemed sufficient. As defined by Chaiken’s 
Heuristic-Systematic Model of information 
processing (HSM; Chaiken, 1980; Todorov, 
Chaiken, & Henderson, 2002), heuristics are 
mental shortcuts, or simple decision rules, aris¬ 
ing from conventional beliefs and expectations 
used repeatedly in daily interactions. In contrast 
to heuristic processing, systematic information 
processing requires more careful consideration 
of all available evidence, and is thus much more 
cognitively taxing (Chen & Chaiken, 1999). 

The HSM posits that reliance on heuris¬ 
tics is often preferable because it minimizes 
cognitive effort while satisfying motivational 
concerns with sufficient reliability. Heuristics 
often provide swift solutions to complex, ill- 
structured problems (Silverman, 1992; Van 
Boven & Loewenstein, 2005), however, reli¬ 
ance on heuristics can also lead to insufficient 
consideration and/or disregard of relevant, 
diagnostic information. Consequently, although 
heuristics do not always lead to bias, an overreli¬ 
ance on them can result in decreased soundness 
of credibility assessments. According to the 
HSM, motivation, time, and ability to process 
information are critical elements for reducing 
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analytical reliance on heuristic processing, and 
encouraging more optimal systematic, delibera¬ 
tive processing. 

Unfortunately, although there is a vast re¬ 
search literature documenting the existence of 
cognitive biases, the literature on the mitigation 
of cognitive biases, especially in the area of cred¬ 
ibility assessment, is scant (Silverman, 1992). 
Because credibility assessment is cognitively 
demanding (Vrij, Fisher, Mann, & Leal, 2008), 
decision-makers are prone to adopt mental 
decision rules that require less cognitive effort. 
However, within the context of intelligence 
analysis, the present proj ect offers an immersive 
training game to help analysts recognize when 
cognitive heuristics are disadvantageous, and 
encourage them to engage in more systematic 
processing so as to reduce the incidence of 
biased reasoning within their decision making. 

Few systematic attempts to mitigate cogni¬ 
tive bias through comprehensive training pro¬ 
grams exists with a few exceptions in medical 
education (Stone & Moskowitz, 2011) and law 
enforcement (van den Heuvel, Alison, & Crego, 
2012) howeverthe effectiveness of training pro¬ 
grams, especially over the long term is unknown 
(Neilens, Handley, & Newstead, 2009). Using 
games for training, especially in the realm of 
intelligence analysis or decision-making is a 
relatively new approach. For example, Crews 
et al. (2007) describe a web-based, learner- 
centered, multimedia training system called 
AGENT99 Trainer that was more successful 
at teaching people to detect deception than 
traditional pedagogical techniques although it 
was not a game per se. Games have been used 
to train in many different domains and a recent 
meta-analysis by Sitzmann (2011) found that 
computer-based simulation games were more 
likely than control conditions to improve factual 
knowledge and skills with a higher retention 
rate. The main advantage that games provide 
is replayability. While a student is unlikely to 
hear a lecture, lesson, or instructional video 
more than once, they can receive repeated 
exposure to training when implemented in a 
video game. They can also receive feedback 
on their mistakes more quickly and learn from 


those mistakes without suffering real-world 
consequences. Thus, a videogame is an ideal 
medium to train about cognitive biases which 
operate outside of conscious awareness and 
may be resistant to training. 

The IARPA Sirius program asked us to 
address three particular cognitive biases in the 
first phase of the program, namely: the funda¬ 
mental attribution error (FAE), confirmation 
bias (CB), and bias blind spot (BBS). The FAE 
is the tendency for people to over-emphasize 
stable, personality-based (i.e., dispositional) 
explanations for behaviors observed in others, 
while simultaneously under-emphasizing the 
role and power of transitory, situational influ¬ 
ences on the same behavior (Harvey, Town, & 
Yarkin, 1981). People engage in CB when they 
tend to search for and interpret information in 
ways that serve to confirm their preconceived 
beliefs, expectations, or hypotheses (Nickerson, 
1998). Similarly, BBS is a form of selective 
perception characterized by the tendency to see 
bias in others, while being blind to it in one’s self 
due to a failure in introspection (Pronin, 2007; 
Pronin & Kugler, 2007; Pronin, Lin, & Ross, 
2002). In our approach to developing methods 
for mitigating these three biases within the 
MACBETH video game, we applied the HSM 
to examine ways players may be encouraged to 
engage in systematic processing while simul¬ 
taneously limiting their reliance on heuristics. 

OVERVIEW OF MACBETH 

Our response to the IARPA challenge of build¬ 
ing a training video game to train intelligence 
analysts to mitigate cognitive bias, was to create 
a fictional environment in which players are 
confronted with a global terrorist threat they 
must prevent, as a clock counts down the time 
remaining before the imminent attack. Players 
take on the role of an intelligence analyst who 
must gather intelligence from assets around the 
world in order to assess the credibility of the 
information, try to determine the accuracy or 
truthfulness of the intelligence, and then appre¬ 
hend the suspect before the attack. Players must 
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determine the location of the terrorist attack, 
the identity of the suspect, and the method of 
attack (i.e., the weapon used). The game was 
originally designed as a single-player game to 
be played against other non-playable charac¬ 
ter (NPC) analysts but was later expanded to 
allow for human players to collaborate with 
one another to solve the scenarios together. 
MACBETH has four distinctive sections (In¬ 
tel Collection, Archive, Case Files, and Intel 
Review), and one game tool—the Notebook 
used for tracking and organizing information 
gathered for determining location, suspect, and 
weapon. Each game section is available at least 
once per turn, however, turns always end with 
Intel Review, where players make a hypothesis, 
or assists other players in making hypotheses. 
Below, the four game sections and Notebook 
tool are each described in turn. 

Intel Collection 

In Intel Collection, players collect two pieces of 
intelligence per round from up to six different 
sources (each holding three pieces of informa¬ 
tion). For example, players can request an analy¬ 
sis of satellite data on suspect movements, find 
out about internet chatter regarding suspicious 
local activities, or ask an asset about dubious 
money transfers possibly aiding a known ter¬ 
rorist group. In response to the two questions 
posed, players are presented with answers of 
varying specification and reliability that can 
be used to confirm or disconfirm locations, 
methods of attack, and/or attributes of a sus¬ 
pect. Certain pieces of intel that are lacking in 
specification may require further investigation, 
in which case a “chip”, obtained in the Archive 
section of the game, is necessary. 

Archive 

In Archive, the player is given the profile of an 
unidentified subject and asked to determine if 
that person represents apotential threat. Players 
are told to review the case files and determine 
whether the person sho uld ha ve been considered 
a threat at the time their case was active. Threat 
determinations are informed by collecting clues 


based on either dispositional or situational in¬ 
formation. Players may collect up to 12 cues 
(6 dispositional and 6 situational) upon which 
they must make their “threat” or “not a threat” 
assessment. If a correct assessment is made 
based entirely on situational clues (thus avoiding 
the FAE), a “chip” is earned, which can be used 
later in the game to follow-up on ambiguous or 
under-specified intelligence gathered during/«- 
tel Collection. The purpose of Archive is to help 
players reduce FAE, thus MACBETH makes 
dispositional cues (e.g., “has a quick temper”) 
less useful in identifying threats than situational 
cues (e.g., “is currently a subject of interest in 
a police investigation”). When justifying their 
position, players must indicate the top three 
pieces of evidence they used to inform their 
assessment, and only correct assessments based 
on three situational clues earn points (toward 
their overall game score) and a chip that will 
later prove useful in verifying ambiguous intel. 

Case Files 

In Case Files, players can browse information 
related to suspects, weapons, and potential 
attack locations. For example, if players learn 
from a source during intel collection that a 
suspect has a history of depression, they can 
read the case files on the suspects to learn 
which ones may be taking drugs for depression. 
Once a player reviews information incase files, 
that information is automatically added to the 
Notebook and is subsequently available for later 
decision making. 

Intel Review 

In Intel Review, players are faced with two 
potential interactions, one of which involves 
the ultimate submission of a final hypothesis. 
In their initial visits to Intel Review, players are 
encouraged to formulate a running hypothesis, 
or assist a fellow analyst (and thereby earn 
points) by providing new evidence. Inboth cases 
the player must select evidence collected during 
Intel Collection to justify the hypothesis or the 
assist. When assisting a fellow analyst, the game 
encourages the use of disconfirming evidence 
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through both feedback and point awards. On 
their second visit to Intel Review, players are 
offered a choice between requesting confirm¬ 
ing or disconfirming evidence from a fellow 
analyst, and this information can be used to 
justify a change on the player’s hypothesis. Just 
as Archive encourages the selection and use of 
situational clues to mitigate FAE, Intel Review 
encourages the selection and use disconfirming 
evidence to mitigate CB. 

Intel Review requires players to interact 
with the AI which, depending on whether they 
are in the single or multi-player versions of 
the game, could be quite different from one 
another. In the single-player version of the 
game, the other analysts (NPCs named Hal 
and Joshua) are not very interactive and it is 
clear to participants that they are not making 
decisions or hypotheses collaboratively. The 
player can request to receive either confirming 
or disconfirming intel from the other analysts 
and can either disconfirm the AI hypothesis or 
justify his or her own hypothesis. The player 
receives feedback based on this choice. When 
the player submits a final hypothesis they gain 
points based on correct items and a bonus 
for the turn in which it was submitted. If the 
player had insufficient evidence to prove the 
hypothesis they receive a penalty. Thus, in the 
single-player version of the game, the AI was 
simply sending intel to the player and was not 
directly communicating with the player. 

For the multi-player version, the player 
can play either with another human partner or 
can play with AI that was designed to simulate 
human actions based on the playtests of human 
players. A player can request assistance from 
another analyst through intel received in a 
dropbox. The dropbox is a messaging system 
used for communication between each player 
in the multi-player version. Aplayer fulfilling a 
request receives feedback based on the type of 
intel submitted. When a player submits a final 
hypothesis, it must be approved or rejected 
by the other player. To reject a hypothesis, a 
player must submit disconfirming intel, which 
then appears in the other player’s dropbox. If a 


hypothesis is approved, the submitting player 
receives a bonus. If a hypothesis is rejected, 
the rejecting player receives points and the 
submitting player receives a penalty. Both 
players share the final approved hypothesis, 
gain points based on correct items, and gain 
points based on the round of the player who 
is farthest in the scenario. The multi-player AI 
is more interactive and collaborative than the 
single-player. 

The Notebook 

The Notebook is not a section of the game, but 
rather a tool for the player. In the Notebook, 
players collect information from Case Files 
that later can be consulted and rated as more 
intel becomes available. Up to the moment 
of final hypothesis submission, the Notebook 
holds all the intel collected, organizing it both 
by source, and by type—i.e., locations, suspects 
and weapons. When players are ready to submit 
their hypotheses during Intel Review, suspects, 
locations and weapons can be selected from 
the Notebook to form running hypotheses, and 
eventually a final solution to the game. 

RAPID ITERATIVE 
PROTOTYPING PROCESS 

In the game development process known as 
rapid iterative prototyping (RIP), the game is 
played, evaluated, adjusted, and played again, 
allowing the design team to optimize adjust¬ 
ments to successive iterations or versions of the 
game (Eladhari & Ollila, 2012). The purpose of 
RIP is to gather feedback on the design while 
production is underway so that changes can be 
made during the development process rather 
than waiting until the game is finished. The 
rapid development cycle allowed by RIP serves 
to leverage input from intelligence and cogni¬ 
tive bias experts on the project team, who are 
able to provide continuing feedback on design, 
playability, and coherence while simultaneously 
making sure the theoretical mitigation goals of 
the project are maintained during each iteration. 
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At present, MACBETH has been through 
more than 100 builds, and has seen dramatic 
changes accrue over its first year of develop¬ 
ment. This process may be broken down into 
three distinct phases: The first ofwhich involved 
the “Paper Prototype” created as a cartoon 
version of the game, capable of illustrating 
the basic concepts and mechanics within the 
game to our cognitive and intelligence subject- 
matter experts (referred to as CSE and ISE 
respectively) as well as the I ARPAreview team. 
During the second phase of the development 
process, termed the “Alpha Phase,” the Archive 
mini-game was developed and then tested 
with a small group of student playtesters, who 
continued to offer feedback as new features 
of the game were added with each prototype 
developed. Finally, after assuring the validity 
and design specifications of MACBETH dur¬ 
ing the “Beta Phase” (Gold & Wolfe, 2012), a 
game build prototype was developed for use in 
our first experiment, assessing game mechanics 
and functionality with over 700 users (Dunbar, 
Miller, et al., 2013). 

Phase 1: Paper Prototype 

A primary goal of prototyping in design re¬ 
search is to understand the problem of a design 
through cycles of design and feedback, rather 
than through speculation. Just as importantly, 
the paper prototype was intended to begin a 
conversation among all project staff, including 


game designers, subject matter experts, cogni¬ 
tive psychologists and measurement specialists 
about proj ect goals informed by tangible materi¬ 
als, rather than speculative goals; in short, rather 
than spend months discussing what a game 
could be, and debating different visions free 
of any tangible products or shared experience, 
we wanted to encourage grounded discussion 
around a shared object (see Boling & Bichel- 
meyer, 1998). In the first designs written for 
the IARPA proposal, it was felt the game might 
appear too much like deskwork for an intelli¬ 
gence analyst (see Figures 1 and 2), which might 
lessen engagement (Schoenau-Fog, 2011). After 
contracting with a team of designers to help 
make the game more engaging, they were able 
to take our initial idea and collaborate with the 
development team, ISE, and CSE personnel to 
add elements based loosely on the board game 
Clue, in which players would have to identify a 
suspect, weapon, and location for the fictional 
attack in a turn-based style. Through several 
face-to-face working sessions with cognitive 
researchers and game developers, the design 
team proposed five high-level game concepts 
that leveraged unique game mechanics to allow 
players to learn about and diminish the targeted 
biases within both single- and multi-player 
game versions. 

Drawing from common practices in visual 
design, we created five initial designs, rather 
than “one master design” in order to: (1) Unearth 


Figure 1. Early design of MACBETH game 
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Figure 2. The first paper playtests of MACBETH 



multiple plausible game designs and prevent 
the team from becoming too stuck on one idea; 
(2) Uncover implicit functional specifications 
that may not have been made explicitly; and (3) 
Create a cohesive vision for the ultimate design 
which would work from multiple perspectives. 
After careful consideration of the strengths 
and weakness of each design, the design team 
moved forward with the development of a paper 
prototype concept merging elements ofthree of 
the initial designs into a single play experience. 
One of these concepts was called “Threat!,” a 
mini-game addressing the FAE that was later 
developed into Archive, and integrated as one 
component of MACBETH, so that the rest of 
the main game could focus on the BBS and 
CB training via the use of a hypothesis testing 
approach within what eventually became the 
Intel Review portion ofthe game. Following two 
months of development, the design team had 
a working prototype of MACBETH (designed 
for four players at a time) available for the CSE 
and ISE teams, and 1ARPA review personnel 
to play, and consequently provide feedback to 
the design team through RIP. 

Consistent with an RIP approach, both the 
ISE and CSE teams had early and persistent 
roles in playtesting to assure realism and verify 
that the implementation of the bias elicitation 
mechanisms were performing as planned. The 
CSE team also reviewed the efficacy of the 
instructional material in informing players of 


mitigation strategies for each bias, and assisted 
in balancing the game feedback systems to 
reward mitigation of bias and redress players 
who committed a bias. All researchers were 
provided access to scheduled builds of the 
game, and asked to play and provide feedback 
via an on-line discussion board. Although they 
helped provide insights on usability and design 
elements, the ISE and CSE teams focused 
primarily on the implementation of content- 
dependent game systems. 

The development team subsequently ana¬ 
lyzed feedback from the CSE and ISE teams 
following the paperplaytest, and although many 
suggestions for improvement were offered, one 
of the most important lessons learned was that 
people did not like being rushed to make a hy¬ 
pothesis too early in the game. Players wanted 
time to gather intelligence and weigh the infor¬ 
mation further before making their hypotheses, 
thus the original prototype requiring a guess 
after the first turn was modified. Relative to bias 
mitigation, the concern was that players could 
not commit CB unless they indeed had a rela¬ 
tively firm hypothesis in mind, otherwise there 
would be nothing to confirm or disconfirm. On 
the other hand, players who waited until the end 
of the game to make a hypothesis would never 
have a chance to commit—or mitigate—CB or 
BBS. So, as a result of this playtest iteration, 
a mechanic was developed to institute “com¬ 
mitment points,” encouraging players to make 
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an early hypothesis, while rewarding them for 
retaining their original positions, unless they had 
contradictory evidence—all before ultimately 
being allowed to make a final hypothesis only 
after their third turn. 

A second important piece of feedback 
from these early playtest sessions involved 
the dossiers of the suspects in the main game. 
Players wanted to see more variety, both in 
terms of gender and background. Indeed, if the 
suspects were all Muslim extremists, that could 
easily encourage stereotypes and inadvertently 
increase both CB and BBS. Thus the develop¬ 
ment team worked with the CSE and ISE teams 
to create dossiers of the suspects, as well as old 
case files for Archive, which could provide a 
richer, more diverse body of potential suspects 
and threats for players to engage and analyze. 

A third important lesson from the paper 
playtests involved the turn-based approach to 
the game. When playing a board game with 
other players face-to-face, it is possible for 
one player to have a turn while other players 
observe, but even when players know what 
their opponents taking a turn are doing, they 
may grow impatient waiting for their turn. In 
a digital game, when players are unable to see 
what others are doing, this may become even 
more critical as players are more prone to im¬ 
patience and boredom awaiting their own turn. 
Consequently, the game mechanic was altered 
to make the turn-taking simultaneous, so that 
within a single-player version of the game, 
artificial intelligence could be employed to 
interact with the players by creating two NPCs 
as fellow analysts, named “Hal” and “Joshua.” 

This development sped up the action by 
allowing a player to be informed that Hal or 
Joshua would like assistance in testing their 
hypotheses, without the player’s assistance 
interrupting the player’s own turn, nor requir¬ 
ing the player to wait for the NPCs to make 
their turns. In a later multi-player version of 
MACBETH, turn-taking remained simultane¬ 
ous and the dropbox system was implemented 
so that play could continue even while the other 
players were making decisions. 


In sum, the paper prototyping phase enabled 
the team to minimize project risks by identify¬ 
ing critical tensions in the game design (i.e. 
managing sufficient investment in hypotheses), 
bringing team members together on a common 
goal, and providing evidence for the potential 
of a game to reach its intended learning goals. 

Phase 2: Alpha Testing 

The development team took over the project 
from the design team, and began converting the 
paper prototype into a digital game. The first 
aspect of the digital version was the Archive 
mini-game, designed to teachplayers aboutFAE 
and help reduce the reliance on dispositional 
attributes (see Figure 3). The CSE team took 
the lead on creating the case files and chose 
historical figures about which the fictitious 
intelligence agency would have old case files. 
Thus 32 distinct Archive case files were created 
based on biographical information about a range 
of characters, from serial killers and terrorists, 
to historical heroes and literary figures. Based 
on the feedback received during the paper 
playtests, these profiles varied in age, gender, 
country of origin, and religious affiliation, with 
the objective being to choose characters play¬ 
ers would be familiar with, so as to produce an 
“aha” moment upon deeming someone to be a 
threat or not a threat. Calling Mother Theresa a 
threat, for example, should accentuate players’ 
realizations that they had indeed used faulty 
decision-making techniques. Thus, MACBETH 
attempts to make players aware of their biases 
by making the dispositional cues non-diagnostic 
and misleading, whereas the situational cues 
are diagnostic and useful. However, the cues 
and characters within Archive needed to be as 
factually correct as possible, which required a 
great deal of research. 

The first independent playtest population 
during Alpha Testing consisted of undergradu¬ 
ate students who had signed up for a semester- 
long course in which they spent several weeks 
learning about the educational capabilities of 
videogames, and playing builds of MACBETH 
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as they became available. Student playtesters 
(N = 13) made one-hour appointments using 
online appointment scheduling software, and 
had one-on-one sessions with a graduate re¬ 
search assistant, during which they played the 
most current build of MACBETH. The research 
assistants observed their game play and noted 
specific in-game behaviors. After completing 
a play session, students responded to questions 
and metrics designed to elicit feedback about 
their play experience, which were recorded, 
summarized, and incorporated back into the 
next build of the game. The CSE teams also 
played each week’s build and made content 
recommendations based on feedback from 
playtesters, as well as their own experiences. 
In-game behavior was also observed by the 
development and CSE teams using Morae 
Observer software, whereupon completing a 
playtest session, researchers assigned to facili¬ 
tate the playtest coded each student playtester’s 
in-game behaviors for markers of bias elicita¬ 
tion, mitigation, and performance feedback. 

In addition to the playtest feedback on 
Archive , the dispositional and situational cues 
were administered to a larger sample (N = 130) 
of undergraduate students with a brief definition 
of the terms “dispositional” and “situational” to 


determine whetherthe Archive cues were clearly 
identifiable as such. From those results, cues 
that playtesters were unable to distinguish as 
dispositional vs. situational were revised easier 
identification. Out of the 3 74 possible cues, any 
cue that 50% or more of the sample mistakenly 
categorized, or reported uncertainty as to their 
dispositional vs. situational nature, was revised. 
As a result, nearly 100 cues were modified 
before being incorporated into Archive. 

One of the most important changes made 
to Archive during Phase 2 was that players 
were asked to rank the top three cues that lead 
them to their threat assessment, so it could be 
determined exactly what information they were 
using to make their decisions. If they correctly 
identified a threat but did so using dispositional 
cues, they would get a message informing them 
they were relying on cognitive biases despite 
making the correct threat assessment. Analysis 
of the players’ trials in Archive indicated they 
were often winning chips while relying on 
dispositions; thus, the decision was made to 
only award chips when situational cues were 
made as the players’ top three choices. This 
way, players would not be rewarded for relying 
on dispositions, and thus, FAE would not be 
encouraged. It was also determined that players 
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would need chips in order to win so that they 
could not skip the FAE training in Archive while 
playing MACBETH. 

Qualitative data were collected throughout 
the course of the Alpha testing to assess player 
affect (i.e., whether the player had positive or 
negative game experience), software perfor¬ 
mance (how responsive, stable, and scalable 
the software was), usability (perceptions of the 
interface and navigation), and game interactivity 
(i.e., how much the player interacted with the 
game). Overall, players responded positively 
to the game concept, and showed they enjoyed 
the investigative and educational nature of 
the game, indicating that “pretending to be an 
agent was fun.” Some users saw a connection 
between media outside the game and the game 
content, and the familiarity made the game more 
enjoyable. For example, one user associated the 
game with television crime shows and said she 
put herself “in the shoes of the psychologist 
from Law and Order: SVU and approached 
the information as if I were taking the case to 
trial.” Negative comments on the concept of 
the game centered on the repetitive nature of 
determining threat. At this phase however, play¬ 
ers were only testing Archive, thus the negative 
comments were due to the iterative nature of 
the playtesting and the limited game content 
available during Alpha Phase. 

Comments regarding programming is¬ 
sues were used to detect bugs inadvertently 
overlooked by the design team. For instance, 
some users repeatedly chose situational cues but 
corresponding feedback advised them to "look 
for situational cues.” Despite the game being 
programed to avoid repetition of suspects, some 
players were given the same suspects multiple 
times. Oftentimes the game would not enter full 
screen mode, or graphics would cover a portion 
of text, etc. Discovering software glitches like 
these during review sessions within the RIP 
process allowed the design team to improve 
the technical quality of the game for each suc¬ 
cessive build. 

In addition to detecting technical problems, 
participants also commented on the game’s 
aesthetics and informative content. The 3-D 


graphics received mixed reviews, for example, 
one user said he “liked the graphics, they were 
better than what 1 expected,” while another 
complained she didn’t care for the 3-D and 
“immediately wanted to change it.” A couple 
users indicated the 3-D was “distracting” and 
“difficult” at first, but once they got used to it, 
one “liked it” whereas the other felt “it didn’t 
add anything to the game.” 

To determine the ways players interacted 
with the game, users were observedby graduate 
assistants, and asked how they made decisions 
regarding threat. The purpose of Archive was 
to teach players to use situational cues over 
dispositional cues in decision making, yet 
58% of participants either admitted or were 
observed using other techniques to make deci¬ 
sions. Some players approached the task as a 
“guessing game” instead of using deductive 
skills. One player described this tactic as a 
“matching game” where the user “was able to 
recognize and guess the characters’ identities 
from the clues” before making a choice. Only 
27% of users read the game feedback and 
realized they should be using situational cues 
over dispositional cues (the remaining 15% of 
playtesters did not indicate how they made deci¬ 
sions). Thus, the challenge for the design team 
centered on increasing players’ interactions 
with the feedback—encouraging them to read 
and learn from the instructions and feedback 
provided by integrating the feedback into the 
course and narrative of the game, rather than 
having it function as a “toll booth,” slowing 
down play, and requiring them to break the 
flow of the experience. 

Once Archive was complete, the devel¬ 
opment team focused on improving the rest 
of MACBETH,(i.e., the main game) which 
included: th e Intel Collection section, in which 
players can query sources and collect answers 
to questions about the suspects, weapons, and 
locations under investigation; the Case Files 
section, where players can read background 
on the suspects, weapons, and locations; the 
Intel Review section, where players can try out 
different hypotheses and request help from the 
other analysts; and finally, th ^Notebook, where 
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players are able to keep track of their intel in 
preparation for submitting their hypotheses. 

In addition to the first group of student 
playtesters described above, three additional 
tests with various groups of players were 
conducted. In the second on-campus playtest, 
faculty and graduate students were recruited 
from the University of Okahoma. In the third 
playtest, nineteen Intelligence officers with at 
least five years of experience were recruited 
from a variety of governmental agencies for 
a day-long session, which included multiple 
play-throughs of the game. The final playtest 
involved play testers selected by the IARPA proj¬ 
ect management team, which also incorporated 
multiple play-throughs as well. 

After each playtest session, players partici¬ 
pated in one-on-one feedback sessions with a 
researcher, then engaged in focus groups during 
which a moderator asked questions to stimulate 
discussion. Overall, playtesters in the three 
groups expressed positive feedback regarding 
the changes in Archive and the analytical concept 
of the game. For example, when commenting 
on Archive , one playtester noted. “All of the 
profiles in archive... those were interesting. I 
like the archive exercises, choosing whether 
or not people were a threat was fun. 1 liked the 
idea that this was a mini game in the whole 
game.” Similarly, in response to the question: 
“What was your favorite part of the game?” 
one playtester commented, “Trying to put 
together all of the options. The analytic part, 
it works for me.” 

In addition to noting which parts of MAC- 
BETFl they liked, feedback from the focus 
groups also suggested numerous refinements. 
For example, at each playtest, players were 
presented with a new way of orientating on their 
play, along with instructions on alternative ways 
of navigating the game. During each session, 
playtesters noted how difficult and confusing the 
game was, with its steep learning curve. Thus, 
a variety of training methods were evaluated, 
including a short video tutorial, and the use of 
briefprinted instructions. Initially however, the 
tutorials proved to be frustratingly short, or were 
presented too quickly, hence, the learning curve 


remained steep, as one playtester noted: “Some 
of the sections were not intuitive.... Wasn’t 
clear what I was supposed to do or why there 
were choices.... Two minute video didn’t cut 
it...” When asked for a suggestion on how to 
make the game more intuitive, this playtester 
responded: “Some mechanics need to be sim¬ 
plified or explained. 1 hate to say fix it with a 
tutorial, but it needs to be more intuitive and 
obvious. Figure out how to fix the mechanics to 
be self-teaching and obvious like tooltips.” In 
response, a tutorial scenario (called “Scenario 
Zero”), was developed to function as a walk¬ 
through of a shortened and simplified game 
scenario. In Scenario Zero there are only three 
weapons, locations, and suspects; and the AI 
walks players through two turns, demonstrating 
all the important elements, including the basics 
of Archive, Intel Collection, the Notebook, and 
Intel Review. 

One build of the game had a “sandbox” 
feature where players could test and explore 
different hypotheses, but playtesters found it 
caused more confusion than it alleviated since 
they could not transfer their sandbox hypoth¬ 
esis to the actual Intel Review section. A more 
pressing problem was that while playing in 
the sandbox, players were not receiving any 
bias mitigation training and thus, it served as a 
distraction from the main purpose of the game. 
Subsequent builds replaced this feature with a 
new hypothesis session within Intel Review. 

Another aspect of the game receiving an 
overhaul after the focus group playtests was the 
Notebook, which, in the original design, was a 
separate area available only from the main menu 
of the game (see Figure 4), which meant play¬ 
ers could not access it while navigating within 
Intel Collection, Case Files, or Intel Review. 
One player commented, “[I would] like to see 
info from suspect bios easier to access. Would 
like to be able to drag a clue from one side of 
notebook to the other... to keep track of which 
clues [led] to which decision.” This and similar 
comments were instrumental in changing the 
Notebook in successive builds, so as to make it 
accessible on every screen, and thus available 
to players throughout the game (see Figure 5). 
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Figure 4. Original design of main page for game 



Figure 5. Notebook re-design 



LONDON 


PARS 


THE HAGUE 


100% 


Ooi ^ ; 00 ©00 ©00 


ASSESSmEnT « 


* 60 


■ • O 50 
\ 


The seat of Federal power of the United 
States, Washington is always considered a 
potential target by enemies of the U.S., and as 
such security is fairly robust in high-profile 
areas. Flecent chatter has suggested that 
attacking multiple smaller government 
buildings with softer security has been 
evaluated as serious strateov. 


- * ■-» 


Confidence I 0% 


WASH NGTON D C 


Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 
















International Journal of Game-Based Learning, 3(4), 7-26, October-December 2013 19 


In addition to playtester feedback regarding 
notebook accessibility, players were observed 
taking copious notes on paper while they played. 
“Notebook is not intuitive,” one player com¬ 
mented, “I need paper.” Although their paper 
notes were helpful in revealing how players 
were copying details from the case files page 
on the weapons, locations, and suspects, they 
ultimately took players out of the flow of the 
game. A key focus group finding showed how 
players desired more options for organizing 
intel within the Notebook so they could read 
profiles more easily while seeing suspect data 
simultaneously. Based on these observations and 
feedback, the Notebook section was revised so 
that as players visited Case Files, the intel they 
accessed on weapons, locations, and suspects 
automatically populated in their Notebook. 

Their notes also revealed players to be 
rating their confidence of intel validity as they 
played, and flagging pieces of intel they thought 
to be particularly important. During the focus 
group sessions players said they wanted to 
gauge their certainty of intel validity so they 
could more easily keep track of what they’d 
learned, and which pieces would be most use¬ 
ful. Regarding how realistic the game was, 
one intelligence officer stated, “You need a 
credibility assessment for every piece of intel. 
This game doesn’t have that. The collector does 
this.” Another stated, “You will never totally be 
sure about an answer. So you have to play on 
levels of certainty, and that is very important for 
an analyst to understand.” Based on these and 
similarpieces of feedback, subsequentbuildsof 
the Notebook developed both a flag function, and 
a slider for players to assess their confidence in 
the credibility of the information accumulated 
during Intel Collection (see Figure 5). 

Phase 3: Beta Testing 

Once Alpha Testing had demonstrated the 
game to be both playable and engaging, the 
CSE and development teams began manipulat¬ 
ing key variables and game mechanics within 
MACBETH to test the efficacy of the game 


at mitigating cognitive bias. Although the 
experimental design and results of the first 
experiment are beyond the scope of this paper 
(they are reported elsewhere; see Dunbar et 
ah, 2013), the primary purpose of Experiment 
One was to test the difference between explicit 
training , in which players are given overt defi¬ 
nitions of the three biases along with priming 
about those biases throughout the game, and 
implicit training, in which players may only 
leam the bias terminology (such as the differ¬ 
ence between dispositional and situational cues 
or confirming and disconfirming information) 
through the process of playing the game. Two 
additional experiments, testing the efficacy of 
the feedback system and the difference between 
single- and multi-player were also conducted. 

To create the explicit training condition, 
short multiple-choice quizzes were created 
and delivered with definitions of the biases at 
relevant points in the game. For BBS, the bias 
training and the quiz were given at the start of 
the game, for CB, the bias training and quiz 
were given at the start of Intel Review, and 
for FAE, the bias training and quiz were given 
before entering the Archive mini-game. The 
purpose of each quiz was to ensure players 
read and understood the definition of each bias 
before continuing. If a player did not answer 
the quiz question correctly, the definition of the 
bias was repeated and they were given a new 
quiz question until they answered correctly, 
demonstrating their understanding of the bias 
definition. 

Two experiments testing the effectiveness 
of the explicit training version of the game were 
conducted, the first ofwhich consisted of a small 
pilot study (N= 85) of a bias priming prototype, 
in which game engagement was measured 
physiologically (Dunbar, Jensen, et ah, 2013). 
The second version consisted of a large-scale 
study (N = 703) in which priming and other 
independent variables were tested (Dunbar, 
Miller, et ah, 2013). As mentioned, the results 
of these experiments are outside the scope of 
this paper; however, both experiments provided 
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opportunities for gathering player feedback 
about the effectiveness of MACBETH. In the 
post-game survey of Experiment One, players 
were given the option to provide feedback about 
their experience, which included open-ended 
responses from 263 participants. All the issues 
within the responses were categorized as either 
generally positive, negative or neutral, ofwhich, 
50% were deemed positive, 66% negative, and 
3% neutral—with some players giving both 
positive and negative feedback. 

Most of these 263 responses addressed 
more than one theme, of which 12 were distinctly 
identified, representing 635 unique pieces of 
data. The positive feedback mostly focused 
on how enjoyable and fun the game was to 
play (41%), how engaging the game was (9%), 
and how much players liked the game design/ 
graphics (5%). For example, one player said, “I 
want to play it some more, it makes me think 
in a different way from most video games. I 
really enjoyed it.” Regarding the immersive 
elements of the game, this user also declared, 
“It was engaging and I feel like I learned a lot 
about FAE, BBS, and the difference between 
social dispositions and current environment.” 
Moreover, on the aesthetics of the game design, 
this player commented: “Great graphics, one of 
the best Eve seen in these simulations, kudos!” 1 

The negative feedback mostly focused 
on how confusing the game was (41%); time 
constraints for playing the game (32%) which 
were restricted by the IARPA rules; and frus¬ 
tration with the tutorial (22%). For example, 
one player’s feedback regarding confusion 
about how to play the game was, “I feel that 1 
probably could have completed the game had 
I understood the controls and the flow of the 
game better. I spent a lot of time muddling about 
in menus that weren’t useful, because I wasn’t 
sure how to play.” 

Another player mentioned his frustration 
with the time component of the game, “I didn’t 
realize that the time was for every part of the 
game and not just the subsections. So I didn’t 
start to consider time as a variable until midway 


through the third round.” Concerning Scenario 
Zero, this player stated: 

The tutorial moved kind of fast to know exactly 
what we were required to do for the game. When 

1 got to the scenario to do it by myself Iwasn’t 
really sure where to start or what to do, so I 
just clicked random buttons. I started to get 
the hang of it towards the end of the game and 
wished I could have kept playing. 

Neutral feedback mainly consisted of 
observations or statements about the game that 
were difficult to interpret. For example, “I was 
slow at first but then I was able to get faster” 
and “The game grew increasingly harder as the 
levels went on.” In addition, many players ad¬ 
dressed problems they encountered (17%) and 
provided suggestions on how to improve the 
game (10%). Most of the problems addressed 
involved Scenario Zero, and time restrictions; 
however, otherproblems encountered included 
the functionality of the controls and navigation, 
as well as glitches found within the game. For 
example, one user commented: “I struggled with 
the functionality of the game. It was somewhat 
confusing in the necessary order and operation of 
the different menus.” This user described a bug 
encountered while playing the game: “there was 
a glitch in the Hawk game (thirdlevel, including 
training). It said I missed points, but the image 
was a reward— and then a popup awarded me 

2 more points...”. We used feedback like this 
to find and repair problems within the game 
feedback logic and to improve game playability. 

One ofthe most common suggestions men¬ 
tioned was to include a back button feature in 
the game (48%), e.g., “I would accidentally click 
‘change hypothesis’ when actually I believe I 
had the correct hypothesis, and once that was 
done, there was no going back, I had to change 
something, even though I did not want to.” Other 
suggestions addressed the time restriction and 
the tutorial feature. For example, with regard to 
the time restriction this user suggested: 
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Take the clock off for the training, I felt like I 
had to quickly get through the tutorial to finish 
that level on my own instead of the entire level 
being the tutorial. Removing the clock would 
remove pressure to quickly work through the 
tutorial and be better preparedfor the real level. 

Regarding Scenario Zero, this user sug¬ 
gested: 

Sometimes, I got lost in the middle of the game. 
For example, I had some trouble selecting and 
confirming the suspect, location, and weapon. 
Having the option to open the tutorial in the 
middle of the game would have helped. But 
other than that, I real[l]y enjoyed the game! 

In summary, players generally were posi¬ 
tive toward the game, with their negative feed¬ 
back largely addressing the time pressure within 
the game, which was deliberately designed to 
add cognitive load because the ability to process 
information is one of the key variables in our 
theoretical guide, the HSM. The time pressure 
was designed to provoke players into both 
committing and mitigatingbiased decisions. By 
increasing time pressure, there was an increased 
likelihood ofplayers relying on heuristics, thus 
provide opportunities within MACBETH for 
them to learn about and correct their own biased 
decision making. 

In subsequent builds of the game, mentors 
have been provided to guide players along, so 
they can receive priming and feedback about 
their decision-making as they play MACBETH. 
To alleviate confusion, a “tool tip” bar has been 
added and made prominent to indicate what the 
various icons mean, and the first few scenarios 
have been made simpler to solve, so as players 
learn to advance, the difficulty of these initial 
scenario will not impede the more important 
bias mitigating aspects of the game. We are 
exploring other ways to reduce the confusion 
experienced within the game as we continue 
development and prepare for our next set of 
experiments. 


CONCLUSION 

This case study of MACBETH represents a 
learning experience in which the rapid itera¬ 
tive prototyping used by the development team 
was informed by continuous consultation with 
content and intelligence experts. Performing 
extensive pilot testing with a range ofplaytesting 
populations was essential for preparing, imple¬ 
menting, and refining MACBETH in advance 
of a large-scale experimental test. Development 
of a training video game of this magnitude in 
under one year is a feat in itself. However, the 
key to its continued success will be a function 
of the care and attention to detail given to 
its design by the ISE, CSE and development 
teams. Many of the lessons learned through 
the playtesting were unique to this case study, 
however, the value of playtesting MACBETH, 
and the agility of the rapid iterative prototyp¬ 
ing system have been critical for designing a 
game capable of being examined within a true 
experimental environment. 

This general process of paper prototyping, 
developing alpha builds, and conducting beta 
testing may be a generalizable process when 
designing games for social impact. We believe 
any team of game developers, regardless of 
their development budget, can enlist the help 
of subject matter experts and design a training 
game that meets multiple goals. Using the RIP 
process enables the project to come into focus 
as researchers and developers better understand 
the dynamics of the design space. The process 
enabled this project to systematically identify 
and reduce potential unknowns and risks. With 
each phase, the project became increasingly 
in focus. We were also able to adapt to player 
feedback through the use of continuous feed¬ 
back processes so that changes could be made 
before development advanced too far. Critical 
to this process (and indeed any social impact 
game, or game for training) is also bringing 
a multi-disciplinary team into focus, as each 
understands a project’s multiple goals and 
competing needs. We recommend a team-based 
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approach and the use of playtesting to gamer 
feedback for developers throughout an entire 
development process to encourage the best 
game development process possible. 
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